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Abstract 

We develop a method to perform model averaging in two-stage linear regression 
systems subject to endogeneity. Our method extends an existing Gibbs sampler for 
instrumental variables to incorporate a component of model uncertainty. Direct eval- 
uation of model probabilities is intractable in this setting. We show that by nesting 
model moves inside the Gibbs sampler, model comparison can be performed via condi- 
tional Bayes factors, leading to straightforward calculations. This new Gibbs sampler 
is only slightly more involved than the original algorithm and exhibits no evidence of 
mixing difficulties. We conclude with a study of two different modeling challenges: 
incorporating uncertainty into the determinants of macroeconomic growth, and esti- 
mating a demand function by instrumenting wholesale on retail prices. 
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1 Introduction 



We consider the problem of incorporating instrument and covariate uncertainty into the 
Bayesian estimation of an instrumental variable (IV) regression system. The concepts of 
model uncertainty and model averaging have received widespread attention in the economics 



Uterature for the standard linear regression framework (see, e.g. Fernandez et al. (2001), 



Eicher et al. (2011) and references therein). However, these frameworks do not directly ad- 



dress endogeneity and only recently has attention been paid to this important component. 
Unfortunately, the nested nature of IV estimation renders direct model comparison exceed- 
ingly difficult. 



This has led to a number of different approaches. Durlauf et al. (2008), Cohen-Cole 



et al. (2009) and Durlauf et al. (2011) consider approximations of marginal likelihoods in 



a framework similar to two-stage least squares. Lenkoski et al. (2011) continues this devel- 
opment with the two-stage Bayesian model averaging (2SBMA) methodology, which uses a 
framework developed by Kleibergen and Zivot ( 2003[ ) to propose a two-stage extension of the 
unit information prior (Kass and Wasserman, 1995). Similar approaches in closely related 



models have been developed by Morales-Benito (2009) and Chen et al. (2009). 



Koop et al. (2012) develop a fully Bayesian methodology that does not utilize approx- 



imations to integrated likelihoods. They develop a reversible jump Markov chain Monte 



Carlo (RJMCMC) algorithm (Green, 1995), which extends the methodology of Holmes et al. 



(2002). The authors then show that the method is able to handle a variety of priors, includ- 



ing those of Dreze (1976), Kleibergen and van Dijk (1998) and Strachan and Inder (2004). 
However, the authors note that direct application of RJMCMC leads to significant mixing 
difficulties and rely on a complicated model move procedure that has similarities to simu- 
lated tempering to escape local model modes. 

We propose an alternative solution to this problem, which we term Instrumental Vari- 
able Bayesian Model Averaging (IVBMA). Our method builds on a Gibbs sampler for the IV 



framework, discussed in Rossi et al. (2006). While direct model comparisons are intractable. 



we introduce the notion of a conditional Bayes factor (CBF), first discussed by Dickey and 



Gunel (1978). The CBF compares two models in a nested hierarchical system, conditional 



on parameters not infiuenced by the models under consideration. We show that the CBF for 
both first and second-stage models is exceedingly straightforward to calculate and essentially 
reduces to the normalizing constants of a multivariate normal distribution. 

This leads to a procedure in which model moves are embedded in a Gibbs sampler, which 
we term MC3-within-Gibbs. Based on this order of operations, IVBMA is then shown to be 



only trivially more difficult than the original Gibbs sampler that does not incorporate model 
uncertainty. A three-step procedure is updated to a five-step procedure and as such, IVBMA 
appears to have limited issues regarding mixing. Furthermore, the routines discussed here 
are contained in the R package ivbma, which can be freely downloaded from the Comprehen- 
sive R Archive Network (CRAN). 

The article proceeds as follows. The basic framework we consider, and the Gibbs sampler 
ignoring model uncertainty is discussed in Section 2. Section 3 reviews the concept of model 
uncertainty, introduces the notion of CBFs and derives the conditional model probabilities 
used by IVBMA. In Section 4 we conclude with two data analyses. The first is the classic 
problem of modeling uncertainty in macroeconomic growth determinants, which has proven 
a testing-ground for BMA in economics. Second, we consider the problem of modeling an 
uncertain demand function, in particular the volume of demand for margarine in Denver, 
Colorado, between January 1993 and March 1995. In Section 5 we conclude. Appendices 
give details of the calculations outlined in Sections 2 and 3. 

2 Methodology 

2.1 Description of the Model 

We consider the classic, two-stage endogenous variable model: 

Y = Xl3+W-r + e (2.1) 

X = Z6+WT + r] (2.2) 

with 

^^^j ~A/'2(0,S) (2.3) 

and 

S = ; 0-12 = CTai 7^ 0. 

\cr21 0-22/ 

In what follows we restrict the response variable Y and the endogenous explanatory 

factor X to be n X 1 W denotes an n x p matrix of further explanatory variables with 
p X 1 parameter vectors 7 and r. The instrumental variables are described by the n x q 



^In the Conclusions section we outline the straightforward steps necessary to incorporate multiple en- 
dogenous variables. 



matrix Z with 5 a g x 1 parameter vector. The coefficient /3 is a scalar. 



Due to (2.3) the error terms are homoscedastic and correlated in each component, since 



Cov(e, r\) = o"i2 = (721 7^ for all observations, requiring joint estimation of the system (2.1 ) 



and (2.2) in order to draw appropriate inference for the parameters in the outcome equation 



(2.1). The assumption of bivariate normality in (2.3) is helpful in deriving a fast algorithm 



for posterior determination; in the Conclusions section we discuss how this may be relaxed. 

2.2 Calculation of the Conditional Posterior Distributions 

In this paper we focus solely on the Bayesian estimation of the IV framework detailed above. 



We consider a prior framework detailed in Rossi et al. (2006)-extended to the multivariate 



setting-as it lends itself to quick posterior estimation through Gibbs sampling. 

In order to adequately explain the CBF calculations we perform in Section |3| it is helpful 
to review the derivation of the conditional posterior distributions. The following three subsec- 
tions will present the calculation of the posterior distribution pr{0\V) of the parameter vector 



e = {/3, 7, (5, r, S) of our model (|2l])-(Q, conditional on the data V = {Y , X, W, Z). 

The Gibbs sampler we outline below divides into three subvectors, p = [/3 7']', A = 
{6' t']' and S with p G M^+p, A G W+p, and S G P2, where P2 denotes the cone of 2 x 2 pos- 
itive definite matrices. Appendix A gives full details of the conditional distributions derived 
below. 

2.2.1 Step 1: The Conditional Posterior Distribution of p 

Assuming a standard normal prior for the second stage regressors p ~ A/'(0,Ii+p), we have 

p|A,S,P~Ar(p,H-i), (2.4) 



where p = ^-^Y VS-\ H = h+p + ^-^V'V with Y = Y - ^77, V = [X W] and 

2 

^ = (Til — — • Details are given in Appendix A. 



2.2.2 Step 2: The Conditional Posterior Distribution of A 

Assuming a standard normal prior for the ffist stage regressors A ~ Af{0,lq+p), we have 

X\p,'E,V r^^{X,n-^), (2.5) 

where A = S'Tri"\ fl = Ig+p + T'T. Here, S is a 2n X 1 matrix formed from Y, X and S 
and T is a 2n X {p + q) matrix formed from W, Z and S, whose construction is outined in 
Appendix A. 
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2.2.3 Step 3: The Conditional Posterior Distribution of S 



Finally, to determine pr(S|p, A, "D), we use an inverse- Wishart prior (e.g. Anderson, 1984). 
Thus, S ~ >V~^(l2, 3) and as the inverse- Wishart is conjugate we have 



S|p,A,P~>V-Xl2 + Q,3 + n) 



(2.6) 



where Q = [e ri\'\e t]]. 



3 Incorporating Model Uncertainty 

We outline our method for incorporating model uncertainty into the estimation of the frame- 



work (2.1) and (2.2). In order to explain the motivation behind our CBF approach, we first 
review some basic results from classic model selection problems. We then show how the con- 
cept of Bayes Factors can be usefully embedded in a Gibbs sampler yielding CBFs. These 
CBFs are then shown to yield straightforward calculations. The section concludes with an 
overview of the full IVBMA procedure. 

3.1 Bayes Factors 

In a general framework, incorporating model uncertainty involves considering a collection of 
candidate models X, using the data V. Each model / consists of a collection of probability 
distributions for the data P, {pr{V\ilj),ip G \E'/} where denotes the parameter space for 
the parameters of model I and is a subset of the full parameter space ^ . 

By letting the model become an additional parameter to be assessed in the posterior, we 
aim to calculate the posterior model probabilities given the data V. By Bayes' rule 

where pr{I), denotes the prior probability for model I EX. 
The integrated likelihood pr(P|/), is defined by 

pr{V\I) = / pr{V\'4))pr{il)\I)d'4), (3.2) 

where pr{'il)\I) is the prior for if) under model /, which by definition has all its mass on 
One possibility for pairwise comparison of models is offered by the Bayes factor (BF), 



which is in most cases defined together with the posterior odds (Kass and Raftery, 1995). 
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Definition 1 (Posterior odds and Bayes factor) 

The posterior odds of model I versus model I' is given by 



pr{I\V) pr{V\I) pr{I) 

pr{r\v) ~ pr{v\r)pr{ry 



where 



pr{V\I) 



and 



pr{I) 



pr{V\r) pr{r) 
denote the Bayes factor and the prior odds of I versus I' , respectively. 



When the integrated hkehhood (3.2 ) and thus the BF can be computed directly, a straightfor- 



ward method for exploring the model space, Markov Chain Monte Carlo Model Composition 



(MC3), was developed by Madigan et al. (1995). 

MC3 determines posterior model probabilities by generating a stochastic process that 
moves through the model space X and has equilibrium distribution pr{I\V). Given the cur- 
rent state MC3 proposes a new model /' according to a proposal distribution q{-\-), 
calculates 



a 



pr{V\r)pr{r)q{I^'^r) 



pr(r'|/W)pr(/W)g(/'|/W) 
and sets = /' with probability min{a, 1} otherwise setting /('^+^) = /('^). 



3.2 Model Determination for Two-Staged Problems 



We now consider the incorporation of model uncertainty into the system (2.1) and (2.2). We 



follow the notation of Lenkoski et al. (2011). Associated with the outcome equation (2.1) 



we consider a collection of models £. Each L G £ consists of a different restriction on the 
parameter p and we denote F^, C M}^^ this restricted space. Similarly in the instrument 



equation (2.2) we consider a collection M. which impose restrictions on the vector A and 
associate with each M G a space Am C W^'^. 

Ideally, we would be able to incorporate model uncertainty into this system in a manner 
analogous to that described above. Unfortunately, 



pr{V\L,M) 



pr{V\p, A, i:)pr{p\L)pr{X\M)pr{i:)dpdXdT, 



'P2 J Am -ITl 

cannot be directly calculated in any obvious manner. Therefore an implementation of MC3 
on the product space of £x is infeasible. What we show below, however, is that embedding 
MC3 within the Gibbs sampler, and therefore calculation using CBFs to move between 
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models, offers an extremely efficient solution. CBFs were originally discussed in [Dickey and 



Gunel (1978) and for the IV framework are defined below. 



Definition 2 (Co iditional 



Be yes factor) 



Given the system (2.1) and (2.2), let S be the covariance matrix and X and p denote the 



parameters of the first and second stage, respectively. 

(a ) The CBF of second stage models L and L' is defined as 

_ pr{V\L',X,^) 
pr{V\L,X,i:)- 

(b ) For first stage models M and M' the CBF is given by 

_ pr{V\M',p,i:) 
pr{V\M,p,i:)- 

Considering CBFsec-, we can see that it relies on determining the quantity 



pr(I)|L,A,S) = / priV\p,\T,)pr{p\L)dp 

which is, in essence, an integrated likelihood for model L conditional on fixed values of A 
and S. In Appendix B we show that 

^ pr{V\p,X,i:)Tyr{p\L)dp ex \'B.l\-^I^ e^Y> Qp^Hip^j . (3.3) 

Where Pi^ and S^, are defined in Appendix B, but are exactly analogous to the p and S 



discussed in section 2.2.1, restricted to the subset of X and W included in model L. 



Similarly, in Appendix B we show that 

pr{V\M, p, E) a \nM\-^'^ exp {^X^^mXi^ , (3.4) 
where Xm and VLm are again defined in Appendix B, but are analogou^ to the similar 



quantities discussed in Section 2.2.2 



Equations (3.3) and (3.4) show that both CBFfst and CBFg^c can be calculated directly. 



Furthermore, these calculations are extremely straightforward, and involve computing little 
more than the parameters necessary for sampling in the Gibbs sampler. 



"^However, as noted in the Appendix, when pi = 0, and thus the endogenous variable is not included in 



(2.1), the update is altered and resembles a seemingly unrelated regression update. 



7 



3.3 Model Space Priors 



Setting a prior on models in the IVBMA framework necessitates-at a minimum-some sub- 
tlety in order to guarantee the pair constitute an IV specification. Let A G C x Ai such 
that (L, M) G ^ if and only if M \ L 7^ 0. We therefore are only interested in considering 
model pairs in the collection A. 
In what follows, we assume 

pr{L,M) oc 1{{L,M) G A}. 
In other words, we assume a uniform prior on the space of models in A. Other priors on the 



Brock et al. 


2003 


Scott and Berger 


2006; 


Durlauf et al. , 


2008 


Ley and 



Steel, 2009[ ) could easily be accommodated. 



3.4 The IVBMA Algorithm 



Building upon the original Gibbs sampler discussed in Section |2.2[ and the derivations in 
Section |3.2| we now outline the IVBMA algorithm, which relies on an MC3-within-Gibbs 
sampleij^ IVBMA creates a sequence {6>(^),...,6>(^)} where 

with G r^(.), A^') G Am(s) and {L^'\ M^'^) G A. Given the current state 0^'^ and the 
data V, IVBMA proceeds as follows 

1. Update L: First, sample L' from the neighborhood of L*^*^ (i.e. uniformly on those 
models that differ from L^^^ by only one variable). Then calculate 

pr(P|L(-),A(^),S(^)) ^ ^ 



using Equation (3.3). With probability minja, 1} set L^^'^^'' = L', otherwise set 

2. Update p: Sample p^'^+i) ~ J\f(p^(s+i),'B~l^^^^) as discussed in Appendix B. 

3. Update M: Sample M' from the neigborhood of M^^\ then calculate 



a 



'In reality, this is simply a special case of the Metropolis-within-Gibbs algorithm (see Chib and Green- 



berg 1995), since the MC3 step can be considered a Metropolis- Hastings step in the space of models. 
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using Equation (3.4). With probability min {a, 1} set M(^+^) = M', otherwise set 
4. Update A: Sample A^"*"*"^-* ~ A/'(A^j{s+i) , fi^^^^+ij) as discussed in Appendix B. 



5. Update E: Use A^"*^^-* and p('^+^) to calculate e*^*^"'^^ and r]''^^^^ and sample 



where 



Q 



This constitutes the entire IVBMA algorithm. The appeal of the procedure is that it is 



hardly more involved than the original Gibbs sampler discussed in Section 2.2 



4 Empirical Analysis 

4.1 Determinants of Macroeconomic Growth 

Modeling uncertainty in macroeconomic growth determinants has proven a testing ground 



for BMA, see Eicher et al. (2011) and the extensive references therein. We consider the 



dataset used in Lenkoski et al. (2011) which builds on that of Rodrik et al. (2004). These 



data juxtapose the most prominent development theories and their associated candidate re- 
gressors in one comprehensive approach. The data have two endogenous variables, a proxy 
for institutions (rule of law) and economic integration. There are four potential instruments 
and 18 additional covariates. Table [T] summarizes the variables included in this study. See 



Lenkoski et al. (2011) for a detailed description of the dataset and the modeling background. 



We took the dataset of Lenkoski et al. (2011 ) and ran IVBMA for 200,000 iterations, dis 



carding the first 20,000 as burn-in. This took approximately 10 minutes to run. By contrast. 



the 2SBMA analysis conducted by Lenkoski et al. (2011 ) on the same data took over 15 hours 



of computing time. The extreme difference in computing time results from the style of the 



two approaches. The 2SBMA methodology of Lenkoski et al. (2011) was designed to mimic 



the 2SLS estimator. It first ran a separate BMA analysis for each first-stage regression. All 
models returned from these two runs were paired and a subsequent BMA was run on the 
outcome equation for each pair. This led to an extremely large number of second-stage BMA 
runs and thus considerable computing time. By contrast, IVBMA models the entire system 
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Table 1: Variable Descriptions from RST dataset. 


V^ariablc Na,iTic 


D escription 






Catholic 


Du^n^^'^variablc^t^a:^n^*^^ if the countr 's o ulation is redominantl catholi 


East Asia 


Dumm'^ variable takin^ value 1 if a countr '^belon'^"to South East^AsieT^ otherwise^ 
urnmy varia e a ing va ue i a coun ry e ongs o ou - as sia, o erwise 


-bjiigr rac 


V r act ion or tne populat ion sp calling r^nglisn , 


TradcSharGS 


Natural logarithm of predicted trade shares computed from a bilateral trade equation with '^pure geography" variables. 


Frost A. rG a 


Proportion of land with ^ 5 frost- days p er month in Avinter . 


Frost Days 


Average number of frost-days per month in winter. 


Integration 


Natural logarithm of openness. Openness is given by the ratio of (nominal) imports plus exports to GDP (in nominal US dollars). 


Latin A. me ric a 


Dummy variable taking value 1 if a country belongs to Latin A.merica or the Caribbean, otherwise 


Latitude 


Distance from Equator of capital city measured as absfLatitude)/©© 


LegalOrigFr 


variable taking a value of 1 if a country has a legal system deriving from that in France 


LegalOrigSocialist 


variable taking a value of 1 if a country has a socialist legal system 


Malaria94 


Malaria index, year 1994. 


MeanTemp 


Average temperature (Celsius). 


Muslim 


Dummy variable taking value 1 if the country's population is predominantly muslim 


Oil 


variable taking value 1 for a country being major oil exporter, otherwise. 


Policy Openness 


Dummy variable that indicates if a country has sufficiently market oriented policies 


PopGrowth 


population growth 


Protestant 


variable taking value 1 if the country's population is predominantly protestant 


RuleofLaw 


Rule of Law index. Refers to 2001 and approximates for 1990's institutions 


SeaAccess 


Dummy variable taking value 1 for countries without access to the sea, otherwise. 


ScttlerMortality 


Natural logarithm of estimated European settlers' mortality rate 


SubSahar a Africa 


taking value 1 if a country belongs to Sub-Saharan Africa, otherwise 


Tropics 


Percentage of tropical land area. 



jointly and this joint approach leads to a dramatic improvement in computational efficiency. 

Table |2] shows the resulting posterior estimates. We see a picture similar to that reported 
by Lenkoski et al. (2011), although with somewhat fewer included determinants. In partic- 



ular, similar to Lenkoski et al. (2011) English and European fractions serve as the two best 
instruments of Rule of Law, while neither settler mortality nor trade receive high inclusion 
probabilities. Further, Integration is well-instrumented by trade shares, which receives an in- 



clusion probability of 1. These results are essentially the same as those reported in Lenkoski 



et ah (2011). 



In the second stage, we see a similar, but markedly sparser conclusion as Lenkoski et al. 



(2011 ). Both rule of law and integration are given strong support by the data, with inclusion 
probabilities of essentially 1. Beyond these two factors only the intercept, an indicator for 
Latin America and an indicator of whether the country has market oriented policies are 
given inclusion probabilities above 0.5 in the second stage. In contrast to 2SBMA, which 
gave evidence to religious and geographic issues as determinants of macroeconomic growth, 
IVBMA points strongly to institutions and integration as the leading determinants. 

Figure [T] shows the posterior distribution of the second-stage coefficients for the four vari- 
ables with the highest inclusion probabilities under IVBMA. We also include the posterior 
distribution of these covariates under an approach that does not incorporate model uncer- 



tainty (which we refer to as IV), and uses the algorithm discussed in Section 2.2 Several 
interesting aspects are clear in Figure [l] Inspecting panel (b), we see that IVBMA has 
led to a posterior distribution on integration with essentially the same mode as that of IV. 
However, the IVBMA distribution is considerably more focused, indicating a reduction in 
parameter variance that results from using parsimonious models. 
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Table 2: Results for Macroeconomic Growth Determinants Example. In this table we show 
the posterior inclusion probabilities (Prob), posterior parameter expectations (Mean) and 
posterior standard deviations (sd) for the two instrument stages as well as the outcome stage. 









Rule 






Trade 






Outcome 




Variable 


Prob 


Mean 


sd 


Prob 


Mean 


sd 


Prob 


Mean 


sd 


RulcofLaw 


- 


- 


- 


- 


- 


- 


0.999 


1.073 


0.224 


Integration 


- 


- 


- 


- 


- 


- 


1 


0.992 


0.164 


Settler Mortality 


0.11 


-0.009 


0.035 


0.097 


-0.006 


0.028 


- 


- 


- 


TradeShares 


0.111 


0.007 


0.037 


1 


0.532 


0.088 


- 


- 


- 


EnglishFrac 


0.91 


1.13 


0.592 


0.539 


0.244 


0.302 


- 


- 


- 


EuropcanFrac 


0.667 


0.459 


0.455 


0.16 


-0.012 


0.087 








Intercept 


0.271 


0.061 


0.278 


0.999 


2.303 


0.343 


0.546 


0.362 


0.793 


Dist_Equ 


0.016 





0.002 


0.007 





0.001 


0.015 





0.002 


Lat_Am 


0.539 


-0.297 


0.371 


0.163 


-0.017 


0.077 


0.981 


1.018 


0.271 


Sub_Africa 


0.207 


-0.029 


0.114 


0.184 


0.027 


0.099 


0.233 


-0.034 


0.133 


E_Asia 


0.415 


0.152 


0.269 


0.957 


0.671 


0.261 


0.381 


0.126 


0.288 


Legor_fr 


0.157 


0.008 


0.067 


0.122 


0.008 


0.045 


0.366 


0.101 


0.173 


Catholic 


0.007 





0.001 


0.003 








0.031 





0.002 


Muslim 


0.002 








0.002 








0.017 





0.001 


Protestant 


0.008 





0.001 


0.021 





0.001 


0.011 





0.001 


Tropics 


0.939 


-0.607 


0.24 


0.338 


0.095 


0.177 


0.381 


-0.131 


0.263 


SeaAccess 


0.165 


-0.012 


0.075 


0.119 


0.001 


0.046 


0.158 


-0.005 


0.085 


Oil 


0.344 


-0.108 


0.212 


0.881 


0.402 


0.21 


0.387 


0.145 


0.266 


Frost_Day 


0.04 


0.001 


0.006 


0.022 





0.002 


0.03 





0.005 


Frost_Area 


0.465 


0.194 


0.289 


0.376 


0.133 


0.24 


0.341 


0.071 


0.253 


Malaria94 


0.242 


-0.042 


0.136 


0.301 


-0.076 


0.152 


0.243 


-0.035 


0.149 


MeanTemp 


0.021 





0.004 


0.017 





0.002 


0.026 





0.005 


Area 





























Population 


0.037 


-0.001 


0.011 


0.042 





0.009 


1 


0.235 


0.04 


PoIicyOpen 


0.37 


0.118 


0.236 


0.228 


0.021 


0.132 


0.511 


0.249 


0.345 



The other three panels also have the feature of tighter posterior distributions under 
IVBMA versus IV. However, what is potentially more interesting is that the distributions 
are also centered in slightly different locations. The effect is particularly large for the Latin 
America indicator, which is tightly centered about its median of 1.06 under IVBMA, while 
more diffuse about the median of 0.43 under IV. The respective posterior standard deviations 
of these two estimates are 0.233 and 0.486 under IVBMA and IV respectively. 

This effect is also evident for the rule of law parameter estimate. Under IVBMA, this 
parameter has a median of 1.08 and posterior standard deviation of 0.224, while under IV 
this parameter has a median of 0.666 and an standard deviation of 0.284. We note that in 



Lenkoski et al. (2011), three increasingly larger runs of 2SBMA were conducted. As the size 



of the considered covariates rose, the posterior estimate on rule of law went from 1.276 (with 
a standard deviation 0.1772) down to an estimate of .798 (.3155). Therefore, our results are 



in line with those of Lenkoski et al. (2011), however it appears evident that IVBMA has 
introduced additional parsimony into resulting models than the nested approach of 2SBMA. 
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(a) Rule of Law (b) Integration (c) Latin America (d) Policy Openness 

Figure 1: Posterior Distribution on selected second-stage coefficients, under IVBMA (solid 
line) and IV (dotted line). In the case of IVBMA, the densities are formed conditional on 
inclusion in the second stage model. 

4.2 Estimating a Demand Function 
4.2.1 Description of the Data Set 



We use the data provided by Chintagunta et al. (2005) (CDG) that had been collected by 



AC Nielsen and follow the approach outlined in Conley et al. (2008). CDG examined the 
purchase of margarine in Denver, Colorado, in a time period of 117 weeks, from January 
1993 until March 1995. The sample consists of weekly prices and purchase data for the four 
main brands of margarine. CDG differentiate between 992 households purchasing margarine 



whereas following Conley et al. (2008) we will not account for heterogeneity but focus on 
the total number of weekly purchases per brand. Furthermore, the data set offers weekly 
information on feature ads and display conditions for each of the four brands. For detailed 



descriptive statistics and marketing conditions of the single brands see Chintagunta et al. 



(2005). 



Since retail price is influenced by unobserved characteristics likely to be correlated with 
sales, it is an endogenous variable. CDG claim that wholesale prices serve as reliable in- 
struments as they should not be sensitive to retail demand shocks. Their results show that 
wholesale prices alone explain nearly 80% of the variation in margarine retail prices. More- 
over, it is often the case that products with considerable shelf-life such as magarine are not 
sold to the consumer within the same week as they are bought at the wholesale establish- 
ment. Thus, CDG added the wholesale prices of up to six weeks before the purchase week 
to the matrix of instruments. 

Besides these variables we entertain two more candidate instruments. We include the 
Consumer Price Index (CPI) for all urban consumers of Colorado and the CPI for food in 



12 



the United States, using the data provided by the U.S. Bureau of Labor Statistics (BLS). 
Since the BLS reports only monthly data, we use the same value for all weeks in the re- 
spective month. Weeks being part of two months are assigned to the month the majority 
of their days belong to. We do not expect these variables to perform as well as wholesale 
prices because they are not collected at a brand level. However, we think it is reasonable (or 
at least vaguely plausible) that overall price levels should influence the price of margarine. 
Our matrix Z therefore consists of nine candidate instrumental variables (see table |3] for an 
overview) . 

In addition to feature ads and display conditions, we entertain several additional variables 
with potential effect on both demand and retail price. Our hypothesis is that holidays could 
positively affect the demand for margarine. We therefore collected data from the Denver 
Public Schools showing the days free of school for the school years 1992/93, 1993/94 and 
1994/95. Differing between whole weeks of holiday and weeks containing only one or two 
free days, we created two dummy variables and added them to the matrix W . 

We also consider the Local Area Unemployment Statistics (LAUS) of Colorado. These 
monthly data provided by the BLS are again adapted to our weekly setup in the manner 
described above. 

Moreover, we entertain the possibility that temperature might also have explanatory 
power for the purchase of margarine. We therefore collected historical temperature data for 
the Denver area from January 1993 until March 1995. Finally, we add four fixed variables to 
W for distinguishing between brands. Table |3] summarizes the different regressors by short 
descriptions. 



Following Conley et al. (2008), we examine the logarithm of each brand's weekly share of 
sales instead of the absolute sales figures. Additionally, we use the logarithm of retail prices 
as endogenous regressors, yielding the regression system 

log Share = (3 \og{retail price) + Wj + e 
log{retail price) = ZS + Wr + rj. 



These transformations are clearly performed in order to use the framework (2.1) and (2.2). 
A more involved specification would directly assess the discrete choice nature of the dataset; 
we discuss this feature in the Conclusions section. 

4.2.2 Results - Factors influencing the Demand for Margarine 

For the margarine data we considered 19 potential influencing factors in the first stage, 
amongst them 9 instruments. In the second stage, we chose 10 variables to predict the log 
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Variable Name 


Description 


WP 


weekly wholesale prices for the fom' different brands for margarine 


lagl WP 


wholesale prices one week before sale to consumer 


lag2 WP 


wholesale prices two weeks before sale to consumer 


lags WP 


wholesale prices three weeks before sale to consumer 


lag4 WP 


wholesale prices four weeks before sale to consumer 


lag5 WP 


wholesale prices five weeks before sale to consumer 


lag6 WP 


wholesale prices six weeks before sale to consumer 


CPI Food 


CPI for food in general in the U.S. 


CPI UrbCol 


CPI for all urban consumers of Colorado 


Feature Ad 


variable indicating the existance and degree of feature ads at the product shelfs 


Display 


variable describing the display conditions 


Intercept 


vector with value 1, reference point for brand indicators 


Brand2 


dummy variable indicating brand 2 


Brands 


dummy variable indicating brand 3 


Brand4 


dummy variable indicating brand 4 


WeekHol 


dummy variable taking value 1 if the whole week was free at Denver Public Schools 


InterHol 


dummy variable taking value 1 if the week had only one or two free days at DPSs 


Temp 


variable showing the average weekly temperature at Denver, Colorado (in Celsius) 


Unemploy 


Local Area Unemployment Statistics for Colorado 



Table 3: Descriptions of the variables contained in Z and W (upper and lower part of the 
table, respectively). 



shares of sales. 

We ran IVBMA for 250,000 iterations and discarded the first 50,000 as burn-in. In or- 
der to examine the mixing properties of IVBMA, we ran 50 independent instances of the 
algorithm initialized at different random starting points and using different random seeds. 
On average, each run took approximately 5 minutes on the hardware discussed above and 
all 50 instances returned identical posterior estimates, indicating convergence and no issues 
regarding mixing. Figure [2] shows a rough diagnostic of this convergence. In it, we show the 
average first stage (Equation 2.2) and second stage (Equation 2.1 ) model size by log iteration 
for each of the 50 chains. As we can see, the figure shows a rapid agreement across chains, 
with an average model size of 11.08 in the first (instrument) stage and 6.30 in the second 
(outcome) stage. While this visual display is only a rough diagnostic, it gives an idea of the 
quick convergence and lack of mixing difficulties of IVBMA. Indeed, it appears that 250, 000 
iterations may have been unnecessary as all chains agree within the first 50, 000 post burn- in 
iterations. 

Columns 1 and 5 of Table |4] show the inclusion probabilities of both stages returned by 
IVBMA. We note that the log price, the endogenous factor, is given an inclusion probability 
of 1. The other columns in Table |4] provide the lower, median, and upper bounds (the 2.5%, 
50% and 97.5% quantiles respectively) of the resulting parameter samples. The posterior 
distribution of /3 has a median of -2.161 with a small range, confirming the expectation that 
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2 4 6 8 10 12 2 4 6 8 10 12 

log Iteration log Iteration 

(a) Instrument Stage (b) Outcome Stage 

Figure 2: Average model size by log iteration for the instrument and outcome models across 
50 separate instances of IVBMA. This provides a rough diagnostic that the IVBMA chains 
have converged. 

higher retail price of margarine diminishes quantity sold. 

Regarding the results for members of W, we see that in both stages the feature ad and 
display variables have inclusion probabilities of more than 95%. The median parameter val- 
ues in Table |4| columns 3 and 6, indicate that feature ads and display conditions have a 
negative effect on price but simultaneously a positive influence on the volume of demand. 

With regard to variables we have added, temperature proves to affect neither prices nor 
sales figures. In our mind this shows the utility of IVBMA, as we were able to entertain this 
additional factor and the method promptly rejected its inclusion. Similar results are found 
for both holiday variables, which have an inclusion probability of less than 6% and therefore 
little influence in either stage. 

However, the unemployment rate for Colorado offers an unexpected surprise. This vari- 
able has an impressively high inclusion probability of 87% in the first stage and a negligible 
influence at the second stage. Therefore, it seems that unemployment could serve as an 
instrument for the endogenous price variable. We feel this could be reasonable, largely be- 
cause our dependent variable is share and not quantity sold. Thus, we can imagine declining 
economic conditions to induce retailers to lower the price of consumer staples, but these 
conditions may have potentially less effect on the overall product mix sold once price adjust- 
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First Stage 






Second Stage 




Variable 


Prob 


Lower 


Median 


Upper 


Prob 


Lower 


Median 


Upper 


log price 


- 


- 


- 


- 


1 


-2.839 


-2.161 


-1.637 


feat.Wl 


0.965 


-0.069 


-0.046 





0.995 


0.182 


0.375 


0.559 


disp.W2 


0.97 


-0.306 


-0.195 





0.997 


0.678 


1.508 


2.311 


Int 


0.999 


-2.603 


-1.417 


-1.261 


1 


-5.508 


-4.556 


-3.86 


Brand2 


1 


0.245 


0.338 


0.426 


1 


0.831 


1.031 


1.33 


Brands 


1 


0.119 


0.141 


0.163 


0.997 


0.152 


0.267 


0.408 


Brand4 


1 


0.172 


0.198 


0.222 


0.137 


-0.014 





0.19 


InterHol 


0.006 











0.057 


-0.009 





0.009 


WeekHol 


0.012 











0.059 








0.044 


Temp 














0.007 











Unemploy 


0.869 


-0.031 


-0.022 





0.054 


-0.036 








lag6.Z6 


0.654 


-0.503 


0.447 


2.68 


- 


- 


- 


- 


lagS.ZS 


0.603 


-0.657 


0.047 


2.487 


- 


- 


- 


— 


lag4.Z4 


0.548 


-0.842 





2.246 


- 






- 


lagS.ZS 


0.529 


-0.926 





2.115 










lag2.Z2 


0.509 


-1.073 





1.94 










lagl.Zl 


0.485 


-1.193 





1.765 










WP.ZO 


0.5 


-1.097 





1.841 










CPIFood 


0.302 


-0.682 





0.503 










CPIUrb 


0.135 








0.008 











Table 4: IVBMA results for the margarine dataset. This table shows first and second stage 
inclusion probabilities (Prob) as well as 2.5%, 50% and 97.5% posterior quantiles (Lower, 
Median, Upper respectively) for each variable included. 

ments are accounted for. As we can see in column 2 of Table |4] this factor is a negatively 
directed factor, i.e. the higher the unemployment rate in Colorado the lower the price for 
margarine. 

Use of wholesale prices as instruments is confirmed, but not wholeheartedly. These vari- 
ables' inclusion probabilities ranged between 48% and 66% (column 1, Table|4]). Interestingly, 
the effect of wholesale prices increases with the weekly time-lags, which is also reflected in 
the posterior medians of the regression coefficients in column 3 of Table |4| As already rea- 
soned above, this could be grounded in the fact retailers often buy their products from the 
wholesalers some time before they are sold to the consumer. Besides wholesale prices, the 
CPI variables also seem to have some, albeit limited, influence on the retail price of mar- 
garine. The CPI for food and the CPI of urban consumers are given inclusions probabilities 
of 30% and 13%, respectively. 
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5 Conclusion 



We have proposed a computationally efficient solution to the problem of incorporating model 
uncertainty into IV estimation. The IVBMA method leverages an existing Gibbs sampler 
and shows that by nesting model moves inside this framework, model averaging can be per- 
formed with minimal additional effort. In contrast to the approximate solution proposed 



by Lenkoski et al. (2011), our method yields a theoretically justified, fully Bayesian pro- 
cedure. The applied examples shows the utility the method offers, by enabling additional 
factors to be entertained by the researcher, which are either incorporated where appropriate 
or promptly dropped. 



The RJMCMC methodology proposed by Koop et al. (2012) constitutes an alternative 
approach to this problem. Their method is considerably more flexible; it allows a range 
of different prior distributions to be entertained and simultaneously addresses hypotheses 
related to identification in the IV system. At the same time, this flexibility comes at a cost. 



Koop et al. (2012) note that their method may exhibit difficulties in mixing and are required 



to consider a complicated model proposal system involving "hot" , "cold" , and "super-hot" 
models which has similarities to simulated tempering. In contrast IVBMA appears to ex- 
hibit few difficulties in mixing, which derives from the simplicity of the algorithm. We feel 
that, at the very least, IVBMA offers a useful methodology for the applied researcher, who 
may be willing to accept the priors we propose in order to quickly obtain useful insight and 
parameter estimates. 

In the IV framework we develop, we consider only one endogenous variable for clarity 
of exposition. Multiple endogenous variables pose no significant additional difficulties. The 



Gibbs sampler in Section 2.2 requires repeated evaluations of a slightly modified Step 2. 
The IVBMA framework simply consists of different ffist-stage models M for each endoge- 
nous variable. The CBFs are hardly changed. This generalization of our framework has 
already been incorporated into the R package ivbma. 

One assumption that is crucial to the functioning of the Gibbs sampler is the bivariate 



normality of the residuals in (2.3). Conley et al. (2008) discuss how the algorithm of Rossi 



eTaL] ( l2006| ) can be extended to handle deviations from normality using a Dirichlet process 



mixture (DPM). We note that the IVBMA methodology can readily be incorporated into 



the DPM framework of Conley et al. (2008) simply by replacing the IV kernel distributions 



of Rossi et al. (2006) with IVBMA kernel distributions. 

A critical feature that has not been addressed in IVBMA is that of instrument validity. 



Lenkoski et al. (2011) propose an approximate test of instrument validity by directly em- 
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bedding the test of Sargan ( 1958 ) into a model averaging framework. While this appears to 



work well, we are currently researching a "fully-Bayesian" version of the Sargan test which 



is based on the CBF of regressing the instrument set on the residuals of (2.1). Subsequent 
research will develop this test and incorporate it into the IVBMA method. A proto-type of 
this diagnostic is already implemented in ivbma. 

Finally, as we note, the margarine dataset is a simplification, as it ignores the aspect of 
multinomial choice and significantly reduces the household information collected. Follow- 



ing Conley et al. (2008), log shares were used to fit the data into the framework (2.1) and 
(2.2). However, we feel that IVBMA has the potential to be extended to more complicated 
likelihood frameworks. Since discrete choice models may be represented in a generalized 
linear model (GLM) framework with latent Gaussian factors (for instance via a multinomial 
probit), a promising next step will be to consider embedding IVBMA in a GLM model and 
operating on these latent factors. In our mind, this indicates the true potential benefit of 
IVBMA. Since the entire method uses a Gibbs framework, it may be incorporated in any 
setting where endogeneity, model uncertainty and latent Gaussianity are present. 

6 Acknowledgements 

The authors would like to thank Pradeep Chintagunta for supplying the margarine dataset, 
Theo S. Eicher for several helpful comments and Andreas Neudecker for support organizing 
the software associated with this work. Alex Lenkoski gratefully acknowledges support from 
the German Science Foundation (DFG), grant GRK 1653. 



References 

Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis. New York.: 
Wiley. 

Brock, W., S. N. Durlauf, and K. West (2003). Policy evaluation in uncertain economic 
environments. Brookings Papers on Economic Activity 1, 235-322. 

Chen, H., A. Mirestean, and C. Tsangarides (2009). Limited information Bayesian model 
averaging for dynamic panels with short time periods. IMF Working Paper WP/09/74 ■ 

Chib, S. and E. Greenberg (1995). Understanding the Metroplis-Hastings algorithm. Journal 
of the American Statistical Association 49, 327-335. 



18 



Chintagunta, P. K., J. P. Dube, and K. Y. Goh (2005). Beyond the endogeneity bias: 
The effect of unmeasured brand characteristics on household-level brand choice models. 
Management Science 51, 832-849. 

Cohcn-Colc, E., S. Durlauf, J. Fagan, and D. Nagin (2009). Model uncertainty and the 
deterrent effect of capital punishment. American Law and Economics Review 11, 335- 
369. 

Conley, T. G., C. Hansen, and P. E. Rossi (2008). Plausibly exogenous. Review of Economics 
and Statistics, Forthcoming . 

Conley, T. G., G. B. Hansen, R. E. McGuUoch, and P. E. Rossi (2008). A semi-parametric 
Bayesian approach to the instrumental variable problem. Journal of Econometrics 144, 
276-305. 

Dickey, J. M. and E. Gunel (1978). Bayes factors from mixed probabilities. J. R. Statist. 
Soc. B 40, 43-46. 

Dreze, J. H. (1976). Bayesian limited information analysis of the simulataneous equations 
model. Econometrica 44, 1045-1075. 

Durlauf, S., A. Kourtellos, and G. M. Tan (2008). Are any growth theories robust? Economic 
Journal 118, 329-346. 

Durlauf, S. N., A. Kourtellos, and G. M. Tan (2011). Is God in the details? a reexamination 
of the role of religion in economic growth. Journal of Applied Econometrics, Forthcoming. 

Eicher, T. S., G. Papageorgiou, and A. E. Raftery (2011). Default priors and predictive per- 
formance in Bayesian model averaging, with application to growth determinants. Journal 
of Applied Econometrics 26, 30-55. 

Fernandez, G., E. Ley, and M. Steel (2001). Benchmark priors for Bayesian model averaging. 
Journal of Econometrics 100, 381-427. 

Green, P. J. (1995). Reversible jump Markov chain Monte Garlo computation and Bayesian 
model determination. Biometrika 82, 711-732. 

Holmes, G., D. Denison, and B. Mallick (2002). Bayesian model order determination and 
basis selection for seemingly unrelated regression. Journal of Computational and Graphical 
Statistics 11, 533-551. 



19 



Kass, R. E. and A. E. Raftery (1995). Bayes factors. Journal of the American Statistical 
Association 90, 773-795. 

Kass, R. E. and L. Wasserman (1995). A reference test for nested hypotheses with large 
samples. Journal of the American Statistical Association 90, 928-934. 

Kleibergen, F. and H. van Dijk (1998). Bayesian simultaneous equations analysis using 
reduced rank structures. Econometric Theory 111, 223-249. 

Kleibergen, F. and E. Zivot (2003). Bayesian and classical approaches to instrumental 
variable regression. Journal of Econometrics 114, 29-72. 

Koop, G., R. Lcon-Gonzalcz, and R. Strachan (2012). Bayesian model averaging in the 
instrumental variable regression model. Journal of Econometrics, Forthcoming. 

Lenkoski, A., T. S. Eicher, and A. E. Raftery (2011). Two-stage Bayesian model averaging 
in endogenous variable models. Econometric Reviews, Forthcoming. 

Ley, E. and M. Steel (2009). Journal of Applied Econometrics 24, 651-674. 

Madigan, D., J. York, and D. AUard (1995). Bayesian graphical models for discrete data. 
International Statistical Review 63, 215-232. 

Morales- Benito, E. (2009). Predetermined variables: Likelihood-based estimation and 
Bayesian averaging. CEMFI working paper. 

Rodrik, D., A. Subramanian, and F. Trebbi (2004). Institutions rule: The primacy of 
institutions over geography and integration in economic development. Journal of Economic 
Growth 9, 131-165. 

Rossi, R E., G. M. AUenby, and R. McCuUoch (2006). Bayesian Statistics and Marketing. 
New York: Wiley. 

Sargan, J. D. (1958). The estimation of economic relationships with instrumental variables. 
Econometrica 26, 393-415. 

Scott, J. G. and J. O. Berger (2006). An exploration of aspects of Bayesian multiple testing. 
J. Statist. Plan. Infer. 136, 2144-2162. 

Strachan, R. and B. Inder (2004). Bayesian analysis of the error correction model. Journal 
of Econometrics 123, 307-325. 



20 



Appendix A 



Details of the determination of pr{p\X,'S,T)) 



Our derivation follows Rossi et al. (2006) closely-extended to the multivariate setting. 



Set V = [X W]. Then, conditional on S and A we have 

Y = Vp + e 

= VpH T] + I/i|2, 

0-22 

where r/ is derived from A and V and (^'i|2)i ~ -^(0,^) where ^ = an — (Ti2/^22- 
Replacing Y hy Y = Y — {(J21 / cr 22)^ yields. 



Y = Fp + i/i|2 



We now compute 



pr{p\Y ,V) (X pr{Y\p,V)pr{p 

1 



cx exp ( --[Y - VpYiY - Vp) - ^p' p 



oc exp 



-2C'YVp + p'(Ii+p + r'V'V)p 



Setting H = Ii+p + ^"^ W and p = ^^^y'vS'^, this becomes 
pr{p\Y,V) ocexp (^-^ [-2pSp + pHp]^ 



Thus, we conclude 



|H|i/2 



p|A,S,P~Ar(p,H-i) 



which confirms (2.4). 



Details of the determination of pr(X\p,'E,'D) 
We now provide a detailed derivation of ( 2.5[ ). 



Inserting (2.2) into (2.1) leads to 



Y = ZSP + Wt/S + W-r + /3r7 + e. 

Conditioning on /3 and 7 we set Y* = /3^^(Y — W^) and obtain Y* = Z6 + Wt + with 
T& = r} + Further §i ~ M{<d, C), with C = ^^22 + /3"Vii + 2;^- V21. 
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We can now write this as a regression system in which the number of observations has been 
doubled to 2n, 

with 

^^^^ Ar,(0,*) and * ^- + >n + |or,, a,, + \a,,^ 



1 

Let $ be the Cholesky decomposition of ^ . We then post-multiply two copies of each 



J]i I \ 0-22 + \cri2 (T22 



component in Equation (A-1) by $ ^, to obtain a regression system with unit covariance 

matrix for the error terms. 

Let 

[1^ X] = [Y* X]^~\ 

[Z, S,] = [Z, Z,]$-\ j = l,...,g 

.-1 



This yields 



[d f]] = [■& r]]^-\ 



^] = l^ ^)a+('^| with ("^M ~A/'(0,l2 



, , iz w\ 

Set S =\Y X 1' and T = \ ^ „ . The posterior distribution of A is determined by the 
same logic as in Step 1 and gives 

where n = Ip+g + T'T and p = S'TQ \ 



Appendix B 

Calculation of CBFgec 

Note that it is immediate from the work above that 

p|L,A,S,P~A/', (p^,H^i), 
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with 



Vl = [Xl Wl], 

where / is the size of model L, and Xl and Wl denote the columns of the matrices X and 

W contained in the model L. 

Now, consider pr{V\L, A, S). Note that 

pr(P|L,A,S)= / priV\p,\i:)prip\L)dp. 

Following the calculation in Appendix A, we have 

pr{V\p, A, S)pr(p|L) oc {2ix)-"^ exp (-\ [-2p^HiP + p'^lp] 



2 



Substituting this above leads to 



pyr{V\L, A, S) cx {2ti)-'/^ ^ exp (^-^ [-2pl'B.lP + p'^lP]^ dp, 



which can be expanded to 



(2vr) '/^exp Qp^Hip^^ exp (^]^[pl'ElPl - '^p'l'^lP + p'SlP]^ (B-1^ 



In this form, the integral in (B-1) represents the normalizing constant of a N'i{pl^'El^) 
distribution, i.e. 

I = J exp (^-^[p^SlPl - ^Pl^lP + p'^lP]^ dp 

/-iHrll/^ / 1 



(27r)-'/2 y (27r)'/2 
(27r 



)V2|S^|-l/2^ 



Thus, 



pr{V\L,X,i:) oc |Hl| ^/^exp ( ^p^SlPl ] ■ 



Calculation of CBFjst 

Provided that pi is not required to be zero (thus, the endogenous variable is included in 
model L), we have that 

A|M,p,S,P~A/'(Am,0^') , 
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where 

Am = S'Tm^m 

where m is the size of model M and Tm is the matrix T defined above, but restricted to 
those variables contained in M. 

When pi = 0, equivalently the endogenous variable is not contained in the model L, the 
approach is altered and essentially becomes a seemingly unrelated regression. Let Um = 
[Zm Wm], then we have 

X = C/a/A + — e + i/2|i, 

0-11 

where, {v2\i)i ~ A^(0,ci;) with u = 0^2 — ^ii/'^ii- Setting X.* = X — (c'"2i/o"ii)e we have 

X* = Um>^ + l^2\l, 
and by analogy to the steps in Appendix A we see that in this case 

Am = ^U'j^^jXflj^J 

Regardless of how Am and f^M are calculated, the steps in outlining the determination of 
CBFsec may be followed in this case as well and we see that 

pr(P|M,p,I]) a iriMr'/'exp QamI^mAm^ • 

Supplementary Simulation Study 

We conduct a simulation study to evaluate the properties of IVBMA and compare its per- 



formance to the Gibbs sampler discussed in Section 2.2 that does not incorporate model 



uncertainty (which we refer to as IV). Our study is similar to that of Lenkoski et al. (2011). 



Using the framework in (2.1)-(2.3), we consider p = 15 variables in g = 10 possi- 
ble instruments in Z and a univariate endogenous regressor X. For simulating data we use 
n = 120. These sizes approximately resemble the structure of the data set we will examine in 
Section ??. In each synthetic dataset we construct, the values in W and Z are individually 
sampled from a M{0, 1). 

The variables Y and X are determined by 

Y = 1.5X + 2Wi + lAW^ + 2.7W8 + 1.25Wg + S.SWis + e 

X = 4.IZ3 + I.2Z7 + 3Zs + O.9Z10 + 2.51^2 + I.TWq + O.8VF13 + T}. 
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First Stage Second Stage 



Variable 


IV 


IVBMA 


True 


IV 


IVBMA 


True 


X 






- 


1.498 


1.497 


1.5 


Wi 


-0.015 


-0.003 





1.986 


1.991 


2 


W2 


2.476 


2.480 


2.5 


-0.007 


-0.002 





W3 


0.008 


0.002 





0.001 


0.000 





W4 


-0.008 


0.001 





1.379 


1.384 


1.4 


W5 


-0.007 


0.001 





-0.001 


0.000 





We 


0.004 


0.000 





0.004 


0.000 





Wr 


0.000 


0.000 





-0.006 


-0.001 





Wg 


-0.020 


-0.005 





2.663 


2.669 


2.7 


Wg 


1.682 


1.684 


1.7 


1.226 


1.230 


1.25 




-0.004 


-0.003 





0.010 


0.002 





Wii 


-0.002 


0.000 





0.006 


0.003 





W12 


-0.004 


-0.002 





-0.017 


-0.001 





Wl3 


0.789 


0.792 


0.8 


3.265 


3.268 


3.3 


Wl4 


-0.001 


0.000 





-0.010 


0.001 





Wl5 


-0.003 


0.000 





-0.004 


0.000 





Zi 


0.011 


0.001 











Zi 


-0.001 


-0.002 











Zz 


4.056 


4.060 


4 








Zi 


-0.004 


0.001 











Z5 


0.002 


0.000 











Zg 


0.002 


0.001 











Zj 


1.193 


1.195 


1.2 








Zs 


2.977 


2.976 


3 








Z9 


0.002 


0.002 











Zio 


0.894 


0.896 


0.9 








MSE 


991.92 


541.14 




943.79 


541.14 





Table 5: Comparison of parameter estimation under IV and IVBMA across 200 repetitions. 
Variables shown in bold are those that are included in either the first or the second stage. 
The values of the total average MSE imply that IVBMA leads to a lower variance in the 
parameter estimation than IV. 
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Hence, besides X five regressors of the vector W have an infiuence on Y . Two of these 
variables also have explanatory power on X, which is in addition dependent on one further 
component of W , namely W2- Finally, only four out of ten candidate variables in Z serve 
as instruments for X, while the rest have no explanatory power. 

Finally, we sample the error terms e and rj from a multivariate normal distribution with 
a non-diagonal covariance matrix S, 

(:)-("■(.'."))■ 

In the following, we use S = 50, 000 as the number of iterations for both methods and discard 
the first 10, 000 samples as burn-in. The results are averaged over 200 replications. Each 
replication took approximately 45 seconds, on a quad-core 2.8 gHz desktop computer with 
16 GB RAM running Linux. 

Table [5] displays the results of parameter estimation for the two methods. For each 
replicate we calculate the posterior expected values X = S~^Y1 p = S^^Yl P''^^ ■ The 

table then reports the median of these estimates for each variable across the 200 replicates. 
Finally, for each replicate we computed the mean squared error (MSE) of the posterior 
expectations A and p and report the average of this over all replicates. We can see that 
for each stage the median of both IVBMA and IV of the posterior expectations are close to 
the true parameter values. However, based on the MSE reported in the last row of Table 
[5] we see that IVBMA leads to considerably lower deviation from the true value than IV 
estimation. This is because model determination provides a better focus on the variables that 
have explanatory power on the outcome, which can be seen from the inclusion probabilities 
shown in Table |6| This table shows the median and interquartile range of the inclusion 
probabilities over all 200 replications. We see that variables which are included the model 
are almost always given inclusion probabilities near 1, while those not in the model typically 
have very low inclusion probabilities. 
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0.999 


(0.993,1.000) 




0.106 


(0.067,0.521) 


0.107 


(0.069,0.507) 




0.102 


(0.067,0.516) 


0.111 


(0.070,0.447) 


T^12 


0.104 


(0.067,0.464) 


0.101 


(0.070,0.458) 


W^13 


0.999 


(0.995,1.000) 


1.000 


(0.996,1.000) 


VFi4 


0.101 


(0.070,0.475) 


0.099 


(0.070,0.458) 




0.096 


(0.068,0.313) 


0.101 


(0.073,0.308) 




0.103 


(0.065,0.403) 






^2 


0.105 


(0.071,0.507) 






^3 


0.999 


(0.987,1.000) 






^4 


0.109 


(0.071,0.569) 






^5 


0.097 


(0.065,0.483) 






^6 


0.100 


(0.068,0.722) 






^7 


0.999 


(0.984,1.000) 






^8 


0.999 


(0.988,1.000) 






^9 


0.104 


(0.070,0.532) 






■^10 


0.999 


(0.985,1.000) 







Table 6: Median and IQR of variable inclusion probabilities across 200 repetitions. Variables 
shown in bold are those that are included in either the first or the second stage. This table 
shows that inclusion probabilities closely match the true structure of the system. 
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